Skip to content

fix(chunking): Fallback to StringChunker for Tree-sitter nodes with no children#145

Merged
Davidyz merged 2 commits intoDavidyz:mainfrom
Jufralice:handle_treesitter_nodes_with_no_children
May 16, 2025
Merged

fix(chunking): Fallback to StringChunker for Tree-sitter nodes with no children#145
Davidyz merged 2 commits intoDavidyz:mainfrom
Jufralice:handle_treesitter_nodes_with_no_children

Conversation

@Jufralice
Copy link
Contributor

When a Tree-sitter node has no children, the TreeSitterChunker would previously not yield any chunks for its content. This change adds a check for nodes with no children and falls back to using the StringChunker on the node's text, ensuring the content is processed.

This is for example useful for files containing embedded content that the primary treesitter parser can't process. For example in Vue3 SFC components (.vue) you can have embedded javascript logic within a tag and then your html template. The parser will parse the html part but will only see one 'RawText' node for the embedded javascript code. If this part is longer then your configured chunk_size, it won't be returned at all by the TreeSitterChunker since this node has no children.

With this fix, the embedded content will fallback to the naive chunking, which is probably way better than being completly ignored

In my neovim environment I somehow managed to have Treesitter be able to parse the embedded content in .vue files but here, for the TreeSitterChunker, the treesiter 'Vue' grammar apparently is not responsible of handling the embedded content (I guess my nvim-treesitter plugin does some magic to merge multiple treesitter parsers for these kind of embedded content....)

…o children

When a Tree-sitter node has no children, the TreeSitterChunker would previously not yield any chunks for its content. This change adds a check for nodes with no children and falls back to using the StringChunker on the node's text, ensuring the content is processed.
@codecov
Copy link

codecov bot commented May 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.24%. Comparing base (d899df3) to head (2512462).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #145   +/-   ##
=======================================
  Coverage   99.24%   99.24%           
=======================================
  Files          22       22           
  Lines        1457     1462    +5     
=======================================
+ Hits         1446     1451    +5     
  Misses         11       11           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Davidyz Davidyz added bug Something isn't working fix labels May 16, 2025
@Davidyz
Copy link
Owner

Davidyz commented May 16, 2025

Hi, thanks for this PR! This is something that I realised a while ago but somehow forgot to add. I'll add some tests to this PR later (for the coverage) and then merge it.

I noticed that in (neo)vim, the syntax/vue.vim simply loads the HTML syntax for parsing. If this works for the .vue files you're mentioning, I'm planning a filetype_map feature that allows you to specify the parser to use based on the filename extension (in fact, I might take it a step forward and allow matching for the full filename). This might also solve your problem, but from a different perspective.

@Davidyz
Copy link
Owner

Davidyz commented May 16, 2025

Ok, I just remembered why I decided to put this off... and this is actually a bug that should be fixed. The StringChunker doesn't compute the row and column correctly for multi-line strings. The neovim plugin doesn't use this yet, but I want to make this right (maybe in a later PR because it may involve changes to other part of the fallbacking mechanism.

@Davidyz Davidyz merged commit dbe1abc into Davidyz:main May 16, 2025
11 checks passed
@Jufralice Jufralice deleted the handle_treesitter_nodes_with_no_children branch May 16, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants